Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available May 4, 2026
-
Free, publicly-accessible full text available November 17, 2025
-
The dominance of machine learning and the ending of Moore’s law have renewed interests in Processor in Memory (PIM) architectures. This interest has produced several recent proposals to modify an FPGA’s BRAM architecture to form a next-generation PIM reconfigurable fabric [1], [2]. PIM architectures can also be realized within today’s FPGAs as overlays without the need to modify the underlying FPGA architecture. To date, there has been no study to understand the comparative advantages of the two approaches. In this paper, we present a study that explores the comparative advantages between two proposed custom architectures and a PIM overlay running on a commodity FPGA. We created PiCaSO, a Processor in/near Memory Scalable and Fast Overlay architecture as a representative PIM overlay. The results of this study show that the PiCaSO overlay achieves up to 80% of the peak throughput of the custom designs with 2.56× shorter latency and 25% – 43% better BRAM memory utilization efficiency. We then show how several key features of the PiCaSO overlay can be integrated into the custom PIM designs to further improve their throughput by 18%, latency by 19.5%, and memory efficiency by 6.2%.more » « less
-
“Active structures” are physical structures that incorporate real-time monitoring and control. Examples include active vibration damping or blast mitigation systems. Evaluating physics-based models in real-time is generally not feasible for such systems having high-rate dynamics which require microsecond response times, but data-driven machine-learning-based models can potentially offer a solution. This paper compares the cost and performance of two FPGA-based implementations of real-time, continuously-trained models for forecasting timeseries signals with non-stationarities, with one using HighLevel Synthesis (HLS) and the other a programmable overlay architecture. The proposed model accepts a uni-variate vibration signal and seeks to forecast future samples to inform highrate controllers. The proposed forecasting method performs two concurrent neural inference operations. One inference forecasts the state of the signal f samples into the future as a function of the most recent h samples, while the other forecasts the current sample given h samples starting from h + f − 1 samples into the past. The first forecast produces the forecast while the second forecast allows the system to calculate the model’s loss and perform an immediate model update before the next sample period.more » « less
An official website of the United States government

Full Text Available